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University of Oslo 

We derive essential elements of quantum mechanics from a para- 
metric structure extending that of traditional mathematical statis- 
tics. The bàsic setting is a set A of incompatible experiments, and 
a transformation group G on the cartesian product II of the param- 
eter spaces of these experiments. The set of possible parameters is 
constrained to lie in a subspace of II, an orbit or a set of orbits of 
G. Each possible model is then connected to a parametric Hilbert 
space. The spaces of different experiments are linked unitarily, thus 
defining a common Hilbert space H. A state is equivalent to a ques- 
tion together with an answer: the choice of an experiment a £ A plus 
a value for the corresponding parameter. Finally, probabilities are 
introduced through Born's formula, which is derived from a recent 
version of Gleason's theorem. This then leads to the usual formalism 
of elementary quantum mechanics in important special cases. The 
theory is illustrated by the example of a quantum particle with spin. 

1. Introduction. Both statistics and quantum theory deal with predic- 
tion using the concept of probability. Historically, the difference between the 
two disciplines has been large, but in the last few years it has diminished, 
not in the least due to the recent work by Barndoríf-Nielsen, Gili and Jupp 
[7]- 

The lack of contact between the two disciplines is of course related to the 
difference in foundation, but one of the aims of the present paper is to argue 
that to a certain extent, this difference in foundation can be overcome. This 
may perhaps at first be difficult to believe: In statistics, the state of a given 
system is given simply by a probability measure on some measurable space. 
In quantum theory in its most common formulation the state of a system is 
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given by a vector v in some abstract Hilbert space. As a continuation of this 
formal theory, each observable is linked to a self-adjoint operator T on the 
same Hilbert space in such a way that the expectation of this observable in 
the state v is given by (v,Tv). Associated with this is Born's formula: The 
transition probability from state u to state v is of the form |(u,ii)| 2 . Also, in 
the absence of what physicists call superselection rules, linear combinations 
of statevectors form new statevectors, which lead to interference phenomena 
unknown to classical statistics. 

The Born formula al·lows physicists to compute probabilities for sets of 
outcomes, perhaps as a function of certain parameters. Statistical methods 
can then be used for inference about these parameters, as discussed in [7]. 
By contrast, the present paper aims at giving a statistical interpretation of 
the vectors v themselves. If parameters are introduced as in op. cit., the total 
model will be similar to the hierarchical models used in Bayesian statistics. 
We will not use these latter kinds of parameters in the present paper. Our 
parametric models will be of the simplest kind, but we will emphasize that 
the choice between different experimental qüestions to focus upon also may 
imply a choice between different parametric models. 

The quantum formalism as such is the result of a long development within 
physics, starting with discoveries by Max Planck, and where contributions 
have been made by Bohr, Pauli, Schròdinger, Heisenberg and many others. 
There are many good books on quantum theory, for instance, [39], where 
also some of the philosophical background is discussed. 

Many authors have tried to find deeper foundations leading to the formal- 
ism of quantum theory. Several mathematical approaches are discussed in 
[60]. One such approach is quantum lògic, treated in detail by Beltrametti 
and Cassinelli [12]. 

The earliest book on the mathematical foundation of quantum mechanics 
is [58]; in English translation, [59]. This book has had great influence; in its 
time it constituted a very important mathematical synthesis of the theory of 
quantum phenomena. The book can also be considered to be a forerunner of 
quantum probability. For physicists, von Neumann's book was supplemented 
by the book of Dirac [24], which started the development leading to modern 
quantum field theory. 

The development of quantum probability as a mathematical discipline, 
continuing the more formal development of quantum theory, was started 
in the 1970's. A first important tòpic was to develop a noncommutative 
analogue of the notion of stochastic processes; see [1] and references therein. 
Other tòpics were noncommutative conditional expectations and quantum 
filtering and prediction theory ([10] and references therein). 

Quantum probability was made popular among ordinary probabilists by 
Meyer [45]. A related book is [49], which discusses the quantum stochastic 
calculus founded by Hudson and Parthasarathy, but also many other themes 
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related to the mathematics of current quantum theory. An example of a sym- 
posium proceeding aiming at covering both conventional probability theory 
and quantum probability is [2]. 

There are also links between quantum theory and statistical inference 
theory. A systematic treatment of quantum hypothesis testing and quantum 
estimation theory was first given by Helstrom [37]. In [38] several aspects of 
quantum inference are discussed in depth; among other things the book con- 
tains a chapter on symmetry groups. A survey paper on quantum inference 
is Malley and Hornstein [43]. 

As an example of a particular statistical tòpic of interest, consider that 
of Fisher information. Since a quantum state ordinarily allows several ex- 
periments, this concept can be gener alized in a natural way. A quantum 
information measure due to Helstrom can be shown to give the maximal 
Fisher information over all possible experiments; for a recent discussion see 
[6]. 

One can thus point to several links between ordinary probability and 
statistics on the one hand and their quantum counterparts on the other hand. 
However, a general theory encompassing both sides, based on a reasonably 
intuitive foundation, has until now been lacking. 

The main purpose of the present paper is indeed to suggest a new ap- 
proach to the statistical foundation of quantum mechanics based on elemen- 
tary concepts such as choice of experiment, probability model, complemen- 
tarity, symmetry and model reduction. I claim that this approach leads to a 
conceptual basis which is more intuitive than the usual one. This is of course 
a very bold statement, knowing how well established the ordinary quantum 
formalism is, especially since the program started here also needs further 
development. Nevertheless, I will claim that for readers knowing statistical 
theory and some group theory, the present approach will probably be more 
enlightening than the usual formalism. 

In addition to the implications for quantum theory, the concepts needed 
to complete this program, and also concepts learned directly from quantum 
theory, may at the same time turn out to lead to an enrichment of current 
statistical theory. 

An example is the concept of complementarity; in our approach this de- 
notes the situation where two parameters cannot both be estimated accu- 
rately in a given context, but it can also be given a wider content. In our 
opinion this concept should not be confined to the microworld. This view is 
also in line with Bohr [16], who gave talks explaining the concept of com- 
plementarity to, among others, biologists and sociologists. 

A related generalization of the ordinary statistical paradigm will in fact 
be bàsic to our main setting: Before we look at the parameter of a concrete 
experiment, we consider all qüestions that can be addressed in any experi- 
ment in a given context. Thus there is a total parameter <f>, which is a vector 
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containing all theoretical quantities that can be imagined for a given system. 
Any experiment which is chosen has a parameter that is a function of ep, but 
ep itself has too rich a content to be estimated. Some ordinary statistical 
situations that can be fit into this pattern are: 

Example 1 . Consider all quantities of relevance that are contemplated 
at the experimental design phase. This can be made concrete in many dif- 
ferent directions. 

Example 2. A questionnaire is designed for a statistical investigation 
with a fixed number of alternatives for each question. Some respondents 
insist on giving unexpected but informative answers, say, comments in ad- 
dition to the fixed qüestions. The total parameter ep may contain some such 
possibilities. 

Example 3. More gener ally: A statistical investigation on some group 
of humans is performed, say, through a questionnaire. Let ep contain all 
possible information about these humans which may have some relevance to 
the concrete qüestions posed. 

Example 4. There is a fragile apparatus for some specific length mea- 
surement which is destroyed after one measurement. Let [i be the length 
which is to be measured. Assume furthermore that the Standard deviation 
of measurement a can only be estimated by destroying the apparatus. Let 
then ep = (/i, a). 

Example 5. Assume that a particular patient has an expected survival 
time A 1 if he gets treatment 1 at a specific time t, and expected survival 
time À 2 if he gets treatment 2 at that time. Here "expected" is not primarily 
meant in relation to a probability model, but may at this point be related to 
what is expected by the medical experts taking into account all knowledge 
they have about the patient and about the treatments. Then <p = (A^A 2 ) 
can never be estimated. 

Example 6. Let there be two qüestions which are to be asked of an 
individual, where we know that the answer will depend on the order in 
which the qüestions are posed. Let (Ai,À2) be the expected answer when 
the qüestions are posed in one order, and (A3,À4) when the qüestions are 
posed in the other order. Then ep = (Ai,...,A4) cannot be estimated from 
one individual. 

Many more realistic, moderately complicated, examples exist, like the 
behavioral parameters of a rat taken together with parameters of the brain 
structure which can only be measured if the rat is killed. 
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We will concentrate much on the statistical parameter space. An essential 
point of the statistical paradigm is that, before the experiment, the param- 
eter A is unknown; afterward it is as a rule fairly accurately determined. In 
this way the focus is shifted from what the value of the parameter "is" to the 
knowledge we have about the parameter. In a physical context this can eas- 
ily be made consistent with the point of view expressed by Niels Bohr, cited 
from [51]: "It is wrong to think that the task of physics is to find out how 
nature is. Physics concerns what we can say about nature." This statement 
is also in agreement with current views of quantum theory, as expressed, for 
instance, by Fuchs [27]. 

It is well known that there exists in the literature a large number of sug- 
gestions for interpretations of quantum theory; a very incomplete list is given 
by the references [13, 15, 20, 25]. Most of these interpretations include the 
ordinary minimalistic interpretation of Niels Bohr (the Copenhagen school 
or pragmàtic interpretation concentrating on interpreting the outcomes of 
concrete experiments; for more details see [39]). The present article also 
implies a particular statistical interpretation related to the Niels Bohr in- 
terpretation, but it is beyond the scope of this paper to discuss in detail 
relations to other interpretation given in the literature. 

There are also a few related papers in the recent literature. Bohr and Ulf- 
beck [14] discuss a foundation of quantum mechanics which is based upon 
irreducible representation of groups, and thus uses symmetry in a way which 
is similar to ours. Caves, Fuchs and Schack [19] proposes a Bayesian approach 
to quantum theory based upon Gleason's powerful Hilbert space theorem. 
Here we will avoid taking an abstract Hilbert space as a point of departure, 
but we will arrive at it from a rather concrete setting. Finally, Hardy [32] de- 
rives quantum theory and probability theory from a few reasonable axioms, 
without going into any details concerning the state concept. 

Sections 2-7 below are preparatory: In Section 2 group actions on the sam- 
ple space and on the parameter space of an experiment are discussed, and 
the concept of permissibility is introduced. In Section 3 it is shown that per- 
missibility always can be achieved by going to a subgroup; such a subgroup 
connected to an experimental parameter will be important later. In Section 4 
the relation to causal inference, in particular to the concept of counterfactu- 
als, is discussed, while in Section 5 the main quantum-mechanical example, 
electron spin, is treated. Section 6 gives the starting point sketched in the 
abstract above: reduction of the cartesian product of the parameter spaces 
of complementary experiments, while Section 7 treats model reduction in 
general and introduces the concept of group representation. 

Then in Sections 8-10 the bàsic Hilbert space is introduced, first for a 
single experiment and then tied together for several complementary experi- 
ments. The treatment in these sections could have been simplified consider- 
ably by concentrating on the parameter space. The full discussion involving 
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the sample space is included mainly for three reasons, however: First, this 
pavès the way for further generalizations. Second, the context of an experi- 
ment is related to the limitation of the data that can be obtained, and this 
context is felt to play a role in the quantization. Third, a discussion of the 
full experiment is needed later in Section 12. 

Before that, in Section 11, operators and states are introduced. 

An important result is proved in Section 12: Born's formula for the tran- 
sition probability between experiments. From this, the bàsic formalism of 
elementary quantum mechanics is derived in Section 13. 

In what follows, we will make sever al explicit assumptions; most of them 
are relatively weak and fairly natural in a statistical setting. The excep- 
tions to this are Assumption 5, which is a simple assumption about the 
connection between the parameter spaces associated with different choices 
of experiments; Assumption 7, which through a limitation of the parameter 
space serves to restrict us to a discussion of elementary quantum theory; 
and finally, Assumption 8, which gives the symmetry assumption needed to 
derive Born's formula and from this the formalism of elementary quantum 
mechanics. 

2. Statistical models and groups. In general the total parameter space 
$ — the range of the total parameter (f> — can have almost any structure; in 
this paper we will assume: 

Assumption 1. is a locally compact topological space. There is a 
transformation group G acting on <ï> which satisfies certain weak technical 
requirements (see Appendix A.l) so that $ can be given a right invaríant 
measure v, that is, a measure which satisfies v({d(j))g) = v(d(j)). 

Note that in this paper, group actions will always be written to the right: 
(f>*—> (j)g. The reason for this is simply that it facilitates the introduction of 
the right invariant measure, which from several points of view [34] in the case 
of a single parameter can be argued to be the best choice of a noninformative 
prior under symmetry in ordinary Bayesian statistical inference. 

The right invariant measure is unique (up to a fixed constant) for transi- 
tive transformation groups, that is, group actions where the space consists of 
one single or bit. An orbit is defined as a set of the form {</>: (p = 4>o9 '■ 9 £ G}. 
In general the space $ can be divided into several orbits, and the invariant 
measure is unique on each orbit; it must be supplemented by some measure 
on the orbit indices in order to give a measure on the whole space <3?. 

When a group G is defined on the (total) parameter space an impor- 
tant property that an experimental parameter may or may not have is the 
following (cf. McCullagh [44], who chose to call this concept natural): 
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Definition 1. The parameter À is called permissible as a function A(çí>) 
if it satisfies: 

If A(0i) = A(02) then \(<f> ig ) = \(</) 2 g) for all g G G. 

The most important argument for this restriction is that it leads to a 
uniquely defined action of the group G on the image space A of A(0): 

(1) (Xg)((b)=X((bg). 

Sever al general arguments for permissibility are given in [33, 34]: When 
this property holds, the best equivariant estimator, which essentially is the 
Bayes estimator under prior u, is conserved under model reduction using 
functions of A. Also, in the transitive case credibility intervals under the 
invariant prior turn out to be identical to confidence intervals, and certain 
paradoxes related to Bayes estimation are avoided. 

Trivially, the total parameter A = <j> itself is permissible. Also, the vector 
parameter (A 1 , . . . , \ k ) is permissible if each A* is permissible. 

As will be shown in the next section, if A is not permissible with respect 
to G, one can always define a maximal subgroup with respect to which A is 
permissible. This will be the usual case in our setting. 

Let now a general group D of transformations be defined on the parameter 
space A — the range of A. This transformation group D will be kept fixed, 
being thought of as a part of the specification of the problem in addition to 
the statistical model. 

Sometimes a group D of transformations on the sample space is defined 
first, and then the actions on the parameter space are introduced via the 
statistical model by defining probability measures P As for g E D on the 
sample space X by 

(2) P X9 (B) = P x (Bg~ 1 ) for sets B. 

Then the connection between these two transformation groups is a homo- 
morphism: If g\ and g 2 are taken to act on the two spaces X and A, then 
g^ 1 and gig 2 act on both spaces in the same way The concept of homomor- 
phism will be fundamental to this paper. It means that we have very similar 
group actions: The identity element, inverses and subgroups are mapped as 
they should be between the two transformation groups; that is, the essential 
structure is inherited. This is the reason why the same symbol D can and 
will be used for both transformation groups. If g is mapped by (2) into the 
identity e only when g = e, then the homomorphism will be an isomorphism: 
The structures of the two groups are then essentially identical. If in addition 
a one-to-one correspondence can be established between the spaces upon 
which the groups act, everything will be equivalent. 
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A further discussion of symmetry groups in statistics is given in [34] and in 
Appendix A.l. Note that the existence of a group D acting on the parameter 
space A in fact requires very few explicit invariance properties. What is 
needed is basically: (i) The sample space and the parameter space should 
both be closed under the transformations in the group. (ii) If the problem 
is formulated in terms of a loss function, this should be unchanged when 
observations and parameters are transformed conformably by the group. (iii) 
If a noninformative prior on A is needed, the right invariant distribution v 
on this space should be used. 

3. Experimental parameters and permissibility. Assuming that a pa- 
rameter or total parameter ep is used to model some given part of reality, 
there are usually many qüestions that can be investigated in such a setting. 
Very often different such qüestions are addressed performing different ex- 
periments on the specific part of reality in question. (A related case is when 
different qüestions are addressed within the same experiment, e.g., when 
statisticians consider different sets of orthogonal contrasts in an analysis of 
variance experiment.) 

Let A be the set of such qüestions from now on in this paper assumed to 
be connected to different experiments. 

Assumption 2. For each a £ A there is a parameter \ a = X a ((p), for 
which we assume that a probability model P A<1 (-) exists corresponding to 
experiment a. It is assumed that each experiment is maximal, that is, that 
there exists no possible experiment with parameter \x a such that A a is a 
proper function of \x a . 

In a physical context, P Aa (-) should be the probability measure for the 
measurement apparatus, at the present moment left unspecified. 

When we in the sequel talk about choice of experiment/question a, we re- 
ally mean a choice of (a,X a ). But the probability measure P Aa (-) is thought 
to be connected to the measurement apparatus, and is not at the outset in- 
cluded in this choice. Quantum probabilities are first introduced in Theorem 
5. 

When a transformation group G is defined on the (total) parameter space 
<]?, an important property of the experimental parameter À a is whether it is 
a permissible function X a (ep). As already said, the most important argument 
for this restriction is that it leads to a uniquely defined transformation group 
G a on the image space A a of \ a (ep), so that (X a g a )(ep) = X a (epg a ) for g a £ G a . 

As a simple illustration of a group connected to a parameter space or 
the total parameter space, look at the (total) parameter ep = (/z, a) with the 
translation/scale group (/í,<t) ^ (a + bfi,ba) where b > 0. The following one- 
dimensional parameters are permissible: /i, a, zí 3 , fi + <r, /i + 3cr, and if a 
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such parameter is asked for some reason, say as a focus parameter, all these 
give vàlid candidates. 

On the other hand, the following parameters are not permissible, and 
would according to McCullagh [44] lead to absurd focus parameters under 
this group: fj, + a 2 , ae 11 , tan(//)/sin(<r). 

A further example is given by the coefficient of variation a/fi. This is not 
permissible. (The location part of the transformation does not make sense 
here.) But it will be permissible if the group is reduced to the pure scale 
group (fjL,cr) i— > (bfj,,ba), b > 0. This points at an important general 

Principle. If a focus parameter A a (<^>) is not permissible with respect to 
the bàsic group G, then take a subgroup G a so that it becomes permissible 
with respect to this subgroup. 

Lemma 1. Given a parameter X a , there is always a maximal subgroup 
G a of G such that X a is permissible with respect to G a . 

Proof. Let G a be the set of all g G G such that for all ^i.^e^we have 
that A a Oi) = A a (0 2 ) if and only if X a {(/)ig) = \ a {(/>2g)- Then G a contains the 
identity. Furthermore, using the definition with <pi,4>2 replaced by <j>igi,4>29i , 
it follows that g\g2 G G a when g\ G G a and g2 G G a . Using the definition with 
4>i,4>2 replaced by </>ig _1 , 4>29~ l , it is clear that it contains inverses. Hence 
G a is a group. It follows from the construction that it is maximal. □ 

From this it follows that the group G a also acts on A a = À a ($), by a 
simple homomorphism determined as in (1). 

4. Experimental parameters and counterfactuals. In our view this choice 
of experiment can also be related to the literature on causal inference, in 
particular to the concept of counterfactuals, which has a central place there. 
A counterfactual question is a question of the form: "What would the result 
have been if . . .?" . A counterfactual variable, in the way this concept is used 
in the literature, is a hypothetical variable giving the result of performing an 
experiment under some specific condition a, when this condition a is known 
not to hold. A typical example is when several treatments can be allocated 
to some given experimental unit at some fixed time, and then in reality only 
one of these treatments can be chosen. 

The use of such a concept goes back to Neyman [48], and has in recent 
dècades been discussed by, among others, Rubin [54], Robins [52, 53], Pearl 
[50] and Gili and Robins [29]. On the other hand, Dawid [21] is skeptical of 
an extensive use of counterfactuals. The discussion of the last paper shows 
some of the positions taken by several prominent scientists on this issue. 
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In our setting, we choose and perforin one experiment a, and then any 
other experiment b imagined at the same time must be regarded as a coun- 
terfactual experiment. However, instead of introducing counterfactual vari- 
ables, I use counterfactual parameters A a , which in my view is a more useful 
concept. Parameters are hypothetical entities that usually cannot be ob- 
served directly. Nevertheless they may be useful in our mental modeling of 
phenomena and in our discussion of them. In the last dècades, such men- 
tal models in causal inference have been developed to great sophistication, 
among other ways by using various graphical tools [41, 50]. In the present 
paper we will limit mental models to scalar and vector parameters, some 
counterfactual, leading to what we have called a total parameter, but this 
model concept can in principle be generalized. 

When it is decided to perform one particular experiment a € A, the A a 
becomes the parameter of this specific experiment, an experiment which 
then also may include a technical or experimental error. In any case, the 
experiment will give an estimate À . If the technical error can be neglected, 
we have a perfect experiment, implying À = À a . 

We are here at a crucial point for understanding the whole theory of 
this paper, namely the transition from the unobserved parameter to the 
observed variable. Let us again look at a single patient at some given time 
who can be given two different treatments. Define À a as the expected survival 
time of this patient under treatment a. Then make a choice of treatment, 
say a = 1. Ultimately, we then observe a survival time i 1 for this patient. 
There is no technical error involved here, so we might say that we then have 
À 1 =À 1 = t 1 . And this is in fact true. Per definition, A 1 is connected to the 
single patient, the defmite treatment time and a definite choice of treatment. 
So even though À 1 is defined at the outset as an unknown parameter, its 
definition is such that, once the experiment is carried out, the parameter 
must by definition take the value í 1 . 

This simple, but crucial phenomenon, which is related to how a concept 
can be defined in a given situation, is in my view of quantum mechanics 
closely connected to what physicists call "the collapse of the wave packet" 
when an observation is undertaken. 

5. A quantum particle with spin. Perhaps the most simple quantum- 
mechanical system is an electron with its spin. The spin component À can be 
measured in any space direction a, and A always takes one of the vàlues —1 or 
+1. Given such a (perfect) measurement, this defines in the usual quantum 
formalism a certain state vector v in a complex two-dimensional vector space 
H, formally as the eigenvector of an operator corresponding to the given 
measurement with the given measurement value as eigenvalue. And given 
this state vector v, quantum mechanics offers formulae, versions of which will 
be discussed later, for predicting the results of further measurements. This 
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quantum-mechanical model for the electron also has several applications to 
other systems. The setup itself is generally called a qubit in the literature. 

As a contrast to this formalism, and to illustrate the general theory of 
this paper, we give a nonstandard description of a particle with spin, a 
description which will turn out in the end to be essentially equivalent to the 
one given by ordinary quantum theory. 

The total parameter cp corresponding to electron spin may be defined as a 
vector in three-dimensional space; the direction of the vector gives the spin 
axis, the norm gives the spinning speed. The associated group G is then the 
group of all rotations of this vector in R 3 around the origin. At the outset, ep 
is a model quantity and hence unknown. As indicated before, we will assume 
throughout that such a total parameter can never assume a definite value 
in the sense that it never can be estimated. Nevertheless, such an abstract 
quantity turns out to be useful in model discussions. 

Now let the electron have such a total parameter <p attached to it. Assume 
first that the system defines a context such that it is only possible to estimate 
some given component of (p. From this point of view, the most that we can 
hope to be able to measure is the angular momentum component 9 a {(p) = 
\<p\cos(a) in some direction given by a unit vector a, where a is the angle 
between (p and a. 

The function 6 a (-) is easily seen to be nonpermissible for fixed a. This 
is simply because two vectors with the same component along a in general 
will have different such components after a rotation. The maximal possible 
choice of the group G a with respect to which 9 a {-) is permissible is the group 
of rotations of the unit vector around the axis a, possibly together with a 
180° rotation around any axis perpendicular to a. 

The group G a also acts on the image space for 9 a . This group action has 
several orbits: For each n € (0,1], one orbit is given by the two-point set 
{— k, k} in @ a . In addition there is an orbit for k = 0. 

We want in general that any reduction of the parameter space should be 
to an orbit or to a set of orbits. Since the value of k may be considered to 
be arbitrary, we concentrate on À a = sign(# a ), taking the two vàlues —1 and 
+1. This also implies that the function \ a ((p) is permissible with respect 
to the group G a , and that this group acts upon À a by exchanging its two 
vàlues. Assume now that the electron in itself defines such a context that 
only A a can be measured, an assumption which is consistent with experience. 
The apparatus usually used to measure such a discretized spin component 
is called a Stern-Gerlach device. 

The unconditional prior probability for À a is 1/2 for each of the vàlues 
±1 by symmetry. Assume now that we know that À a = +1, and that we af- 
terward will measure the spin component in another direction b. We assume 
for simplicity that we have an ideal measurement apparatus in the direction 
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b, so that what we seek is the transition probability in parameter space, 

P(A b = +l|A a = +l). 

The formal quantum-mechanical solution of this is well known in the 
physics literature. Let the components of the (unit) a-vector be (a x ,a y ,a z ), 
and let a x , a y and a z be the three Pauli spin operators 

(3) ^={1 J), Ov=(? "'), _° 1 )· 

Calculate the eigenvector u a for the operator a x cr x + a y <r y + a^cr^ correspond- 
ing to the eigenvalue +1, and do a similar thing in the ò-direction. Then the 
formalism of quantum mechanics (see Section 14 below) says that 

(4) p(A 6 = +l|A a = +l) = |ï/V| 2 . 
A straightforward calculation then gives 

(5) P(A 6 = +l|A a = +l) = (l + cos(u))/2, 

where u is the angle between the a-vector and the ò-vector. 

A general statistical approach to transition probabilities is given in The- 
orem 5 below. 

6. Parameters of several statistical experiments. Up to now, we have 
assumed the existence of a total parameter. This section gives a very general 
alternative way to arrive at this concept. 

Consider a set A of mutually exclusive experiments, each of the ordinary 
statistical kind, but we will concentrate on the parameter spaces A a ;a £ A. 
The whole set of parameters of the experiments is given by points in the big 
space 

n= x A a , 

a 

a Cartesian product. If all parameter spaces have the same structure A, this 
can be considered to be the set of functions from A to A. 
Let there be defined a transformation group G on II. 

Example 7 (Compare Example 5). Let tt = (A^A 2 ), where A 1 and A 2 
are the expected lifelengths of a single patient under two mutually exclusive 
treatments. Let G be the joint set of time scale transformations together 
with the exchange A 1 <-> A 2 . 

Example 8. Consider again the electron spin. Let 7r = (A a ; a G A), where 
A a is the spin component ±1 of a perfect measurement in the direction a of 
an electron. Let G be the group generated by the transformations: 



STATISTICS AND QUANTUM MECHANICS 



13 



(i) Inversions: À a i— > — À a . 

(ii) Rotations of experiments: If a i— > ao under a rotation o, replace each 
À a with \ ao . This gives a permutation within the cartesian product. 

Note in general that the points of II make sense mathematically, but not 
directly physically, hence it does not make sense in a physical context to 
give vàlues to the individual points of this space. The space II will hence 
not be called a state space. 

So what operations are meaningful with the spaces II? I have mentioned 
group operations. One can also adjoin such spaces corresponding to different 
systems, and adjoin ir with some other parameter. Finally, one can look at 
subspaces. 

Assume that the experiments are related in some way Then it may be 
reasonable to try to reduce the space II. The purpose of this reduction may 
be to achieve parsimony. This should not be thought of as an approximation, 
however, but may be a result of some physical theory. Note that theories 
are formulated not in terms of observations, but in terms of parameters, the 
theoretical language behind observations. 

Let II be reduced to a subspace ^ with the property: 

Property 1. ^ is an orbit, that is, a set of the form {ir : ir = n^g :g£G}, 
or a set of orbits for the group G. Use the notation G also for this group 
acting on 

This is a necessary condition in order that G should be a transformation 
group on the reduced space. It is also consistent with the discussion elsewhere 
in this paper. In [34] there are given several examples of model reductions 
connected to single experiments where the reduced space is an orbit or a set 
of orbits of an associated transformation group. 

It is natural in certain situations to demand also: 

Property 2. Each section {tt Gil: A a (7r) = Ào} has a nonzero intersec- 
tion with ^ for a set of specified vàlues Ao- 

In fact, this will always be true for some vàlues Ao- In a future publication 
we hope to use this fact together with some group representation theory to 
discuss quantization itself. 

Let now the model reduction be associated with some function on II 
which is one-to-one on the subset and undefined elsewhere. It follows then 
from Property 1 that the group G is well defined on the range of <f>. 

Definition 2. If such a function exists, call = cp(^f) the total parame- 
ter space. Any function with the above properties is called a total parameter. 
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A total parameter (f) can in principle be replaced by any other total pa- 
rameter in one-to-one correspondence with (j>. But it is important to have a 
simple representation. 

If Property 2 holds, then each A a can be regarded as a function on <ï>. 

Ex ample 8 (continued). Restrict II to the subset í', the set of all tt 
such that there exists a vector </> that gives each A a equal to sign(a • (j>). Let 
0(7r) be this direction normed as a unit vector. 

- Taken as a unit vector (fi(7r) is a unique function of tt. 

Proof. Suppose that there is a n which corresponds to two different 
unit vectors </>i and <p2- Then a = 4>i — (j>%, normalized gives A a = +1 corre- 
sponding to 4>\ and A a = — 1 corresponding to 02, a contradiction. □ 

- The set is an or bit of G. 

Proof. It is easy to see that ^ is closed under inversions and rotations. 

□ 

- All sections {tt: A a (-7r) = ±1} have nonzero intersections with fy. 
Proof. Obvious. □ 

From this, we are back to the situation discussed in Section 5. 

7. Experiment, model reduction and group representation. Now let the 

experimentalist have the choice between different experiments aG^lon the 
same unit(s), where the experiment a consists of measuring some y a , with 
y a = y a (uj) being a function on some sample space S, and where the mea- 
surement process is modeled with a parameter A a . This parameter is a part 
of the model description of the units, and all the model parameters may be 
seen as functions A a (0) of a total parameter <p. 

We use a common sample space S for all experiments a, since this space 
can be imagined in terms of a common measurement apparatus or some set 
of apparatus. Specincally we assume: 

Assumption 3. There is a common sample space S. The reduced model 
probability measures P Aa are jointly dominated, that is, absolutely contin- 
uous with respect to a fixed probability measure P on the sample space 
S. 



STATISTICS AND QUANTUM MECHANICS 



15 



In the electron case this simply means that one in principle can assume 
that the same or the same kind of Stern-Gerlach apparatus can be used for 
every measurement. The measure P can be assumed to be Bernoulli(l/2). 

In the previous section, a global model reduction was introduced by re- 
ducing the large space II to one or a few orbits of the bàsic group G. As in 
the electron spin example, it may also be natural or necessary to reduce the 
original parameter 6 a to a new parameter A a . All such model reduction is 
done by selecting one or a few orbits of the relevant group G a . 

The most important theoretical argument for model reduction associated 
with orbits of the group is the following: All models should have a parameter 
space which is invariant under the group. For the reduced model this is only 
possible when the parameter space in question is composed of orbits of the 
relevant group. 

Here is another argument: The Pitman estimator is equal to the Bayes 
estimator under right invariant prior, and this estimator is important in 
many applications. In order that this shall make sense for the reduced model, 
the parameter space of this reduced model must be constructed from orbits 
of the parameter group actions. 

A further discussion of model reduction under symmetry in statistics and 
in quantum mechanics will be given elsewhere, and we then also hope to 
relate the discussion to the concept of group representation, which is very 
useful in quantum theory. 

Generally (see also Appendix A.2), a group representation is a class of 
operators {U(g);g £ G} on a vector space space V, where G is a group, 
such that the operators satisfy the property U(gh) = U(g)U(h). This gives 
a group of operators homomorphic to the group G, and, as the name says, it 
is used to represent the group in a specific way There is a large mathematical 
literature on group representations. 

Specifically, the regular representation U(G) on L 2 ( ( I > ,^), where v is a 
right invariant measure for the bàsic group G, is given by 

(6) U(g)f(<f>) = f(<f>g). 

Explicitly, this implies that U(G) is a group of linear operators acting on 
L 2 ($, u). The group property of U(G) is well known and easily verified. The 
same formula (6) is vàlid for any subspace V of L 2 (<í>,z/) which is invariant 
under the group of operators U (G), that is, such that U(g)f S V when / G V 
and g £ G. 

We will also consider group representation spaces of the group G a acting 
on cj). Let À a be a permissible function of <f>. Then 

V x a = {f £L 2 ($, u):f(d>) = f(X a (m 



is an invariant subspace of L 2 (<ï>,^) under the regular representation U(G a ). 
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8. Experimental basis and the Hilbert space of a single experiment. Up 

to now the discussion has been largely in terms of models and abstract pa- 
rameters. Now we introduce observations in more detail. We have already 
stressed that in a given situation we have a choice between different ex- 
periments/qüestions a. In this section we give a general discussion fixing 
this experiment, and hence fixing the parametric function A(^>). Given a 
measurement instrument, this will lead to a statistical model P A . 

In this section we will need to introduce some statistical concepts; for a 
more thorough treatment, see, for example, [42]. 

We use the ordinary concept of sufficiency, repeated for convenience: 

Definition 3. A random variable t = t(uj);tü £ S connected to a model 
P A is called sufficient if the conditional distribution of each other variable 
y, given t, is independent of the parameter À. 

A sufficient statistic t is minimal if all other sufficient statistics are func- 
tions of t. It is complete if 

(7) E A (/i(í)) = for all A implies h(t) = 0. 

It is well known that a minimal sufficient statistic always exists and is 
unique except for invertible transformations, and that every complete suffi- 
cient statistic is minimal. If the statistical model has a density belonging to 
an exponential class 

b(y)d(X)e c ^' t ^\ 

and if c(A) = {c(A) : A G A} contains some open set, then the statistic t is 
complete sufficient. 

Recali that a function £(A) is called unbiasedly estimable if E A (y) = Ç(A) 
for some y. Given a complete sufficient statistic t, every unbiasedly estimable 
function Ç(A) has one and only one unbiased estimator that is a function 
of t. This is the unique unbiased estimator with minimum risk under weak 
conditions [42]. Thus complete sufficiency leads to efficient estimation. 

Assumption 4. For each a S A the experiment can be chosen in such a 
way that there is a complete sufficient statistic t a under the model P Aa . 

For the rest of this section we fix such an experiment and drop the index 
a. We write D for G a , which will be a fixed group on the common sample 
space S, but also acts on the selected parameter space. 



Definition 4. The Hilbert space K is defined as the set of all functions 
h(t) such that h(t) £ L 2 (5,P) and f{4>) = E x ^\h(t)) £ L 2 ($,i/). 



STATISTICS AND QUANTUM MECHANICS 



17 



In this definition the function h is assumed to be complex- valued. It is 
easy to see that (7) holds for complex functions if and only if it holds for 
real- valued functions. 

A sufficient condition for / G L 2 ($,i/) is that J E x ^ (\h(t)\ 2 )i^(d(j)) < oo. 
Since it is defined as a closed subspace of a Hilbert space, the Hilbert space 
property of K is seen to hold. 

Let then the group D be acting upon the sample space S, on the parameter 
space A and on the total parameter space <ï>. Recali the brief discussion of 
group representations in Section 7. In particular, recali the definition of the 
space V\, an invariant space under the regular representation of the group 
D on L 2 (í>,i/). 

Proposition 1. Each space K is an invariant space for the regular 
representation of the observational group D on L 2 (S,P), that is, under 
U(g)h(t)=h(tg);geD. 

Proof. If t is sufficient under the model P A , and D is the group acting 
on the sample space, then tg given by (tg)(u>) = t{ujg) is sufficient for all 
g G D. This is proved by a simple exercise using (2). Also, if t is complete, 
then tg must be complete; hence the two must be equivalent. The norm 
conditions are easy to verify. Therefore K is invariant under D. □ 

Consider now the operator A from K to V\ C L 2 ($>,v) defined by 

(8) (Ay)(\(cf>)) = I y(u,)P A W(da;) = E x ^(y), 

using again the (reduced) model P x (duj) corresponding to the experiment a. 
In the following it will be important to use K to construct a Hilbert space 
related to the parameter space. 

Definition 5. Define the space L by L = AK. 

By the definition of a complete sufficient statistic, the operator A will 
have a trivial kernel as a mapping from K onto AK.. Hence this mapping is 
one-to-one. It is also continuous and has a continuous inverse. (See below.) 
Hence L is a closed subspace of L 2 ($,i/), and therefore a Hilbert space. 
Note also that L is the space in L 2 ($,i/) of unbiasedly estimable functions 
with estimators in L 2 (S, P). It is in general included in the space Va of all 
functions of the parameter À. 

Proposition 2. The space L is an invariant subspace o/L 2 ($,z/) for 
the regular representation of the group D on L 2 (<3?, v). 
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Proof. Assume that £(A) = E (y) is unbiasedly estimable. Then also 
rj(X) = Ç(A<7) = E As (y) = E x (yg) is unbiasedly estimable, so L is an invariant 
space under the regular representation U of D, defined by U (g)f(X) = f(Xg) . 
□ 

A main result is now: 

Theorem 1. The spaces K C L 2 (5,P) and L C L 2 ($,z/) are unitarily 
related. Also, the regular representatíons of the group D properly defined on 
these spaces are unitarily related. 

Proof. We will show that the mapping A can be replaced by a unitary 
map in the relation L = AJ£. 

Recali that the connection from the observation group to the parameter 
group D is given from the model by 

(9) P x 3(B)=P x (Bg~ 1 ); ge D. 

Using the definition (8) and the connection (9), we find the following 
relationships. We assume that the random variable y(-) belongs to K C 
L 2 (£, P) and that U is chosen as a representation on the invariant space L. 
Then 

V{g)Ay{\) = J y(u;)P^(dcü) 

(10) = I y^idcog- 1 ) 

= J y(Lüg)P x (du;)=AÜ(g)y(\), 

where U is the representation on K given by Uy{u) =y(ug), that is, the 
regular representation on L 2 (S, P) restricted to this space. 

Thus V{g)A = AÜ(g) on K. 

Hence 

U(g)=ÏÏ(g) = AÚ(g)A- 1 ; ge D. 

Recali that the action of D on A is defined by (Xg)((p) = X((j>g), and that 
Ü(g) = U(g) on V x . Here U(g)f((f>) = f{cj>g) when / G V x and g G D. 

By Naimark and Stern ([47], page 48), if two representations of a group 
are equivalent, they are unitarily equivalent. (The result there is formulated 
for the finite-dimensional case, but the proof is vàlid in general.) Hence for 
some unitary C we have 

(11) Ü(g)=CÚ(g)Cl 

Since the unitary operators in this proof are defined on K and L, respec- 
tively, it follows that these spaces are related by L = CK. 
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Definition 4 may also be coupled to the operator A and to an arbitrary 
Hilbert space K' of sufíicient statistics, which for instance may be the whole 
space L 2 (5,P). First let 

(12) M = {y G K 7 : E A y = for all A}. 

Then K may be considered as the factor space K'/M, that is, the equivalence 
classes of the old K' with respect to the linear subspace M (cf. [47], I.2.10IV). 

Here is a proof of this fact: Let £ G AK', such that Ç(A) = E A (y) for some 
y G K'. Then y is an unbiased estimator of the function £(A). By Lehmann 
and Casella ([42], Lemma 1.10), £(A) has one and only one unbiased estima- 
tor which is a function h(t) of t. Then every unbiased estimator of £(A) is 
of the form y = h(t) + x, where x G M; this constitutes an equivalence class. 
On the other hand, every h(t) can be taken as such & y. □ 

9. The parametric Hilbert space ofa selected experiment. Return to the 
situation where one selects an experiment a among a class of experiments 
A. Corresponding to this choice we now have a parametric Hilbert space L a 
and an observational Hilbert space K a . This models a certain measurement 
apparatus, and in many cases one would expect that the parameter space, 
and hence the space L a , will represent some intrinsic property of nature, 
and therefore be independent of the choice of measurement apparatus. 

However, to cover all cases, and to get a unique definition, we will define 
the parametric Hilbert space connected to question a G A through a special 
choice of measurement apparatus. 

Definition 6. (i) Before any experiment is done, A a is just the name 
of some parameter. After the experiment, we have some estimate A a of this 
parameter. The experiment is called perfect if experimental error can be ne- 
glected, so that A a is the realized value of the parameter in this experiment. 

(ii) Define the Hilbert space H a connected to question a G A as the space 
L a for a perfect experiment with parameter A a . 

One remark is that even in the perfect case it may be important to dis- 
tinguish between a parameter and its realized value. In the electron spin 
case, a perfect measurement means simply that the Stern-Gerlach appara- 
tus functions without any error. 

We will see later that under natural assumptions a nonperfect experiment 
may be related to the same space H a . 

Proposition 3. With the above definitions the space H a is just the 
space V£ of functions f of A a (-) such that f{<j>) = f(X a ((j))) G L 2 ($,z/). 
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Proof. If / is arbitrary and the experiment is perfect, then / \ f(X a )\ 2 dP = 
|/(À a (</>))| 2 is finite. This then follows from Definitions 4, 5 and 6. □ 

As an example, in the electron spin case, the total parameter <f> is the 
spin vector and L 2 ($,z^) corresponds to a measure v which is uniform on 
any shell, and where any measure on \<$>\ can be used. Let \ a {4>) = sign(a • (f). 
Then H a is simply the space of functions of A a (</>), a two-dimensional space. 
Specifically, H a is the space of functions of (f> which are constant on the two 
half-spaces separated by a plane through the origin perpendicular to the 
vector a. 

All this indicates that our discussion could have been simplified by con- 
centrating on the parameter space. Our reasons for nevertheless giving a full 
treatment involving the sample space have been given in the Introduction. 

10. The quantum-theoretical Hilbert space. Our task in this section is 
to tie the spaces H a together. Our essential point of departure here is that 
the parameter spaces of the different experiments have a similar structure. 
Then it is not unreasonable to assume that they can be transformed over 
to each other by some element of the bàsic group G. This will not give 
the most general case of the quantum-mechanical formalism, but gives a 
treatment which includes qubits, higher spins, several partides and the most 
important cases of entanglement, a phenomenon which is much discussed in 
the quantum-mechanical literature. 

Assumption 5. For each pair of experiments a,b S A there is an ele- 
ment g a b of the bàsic group G which induces a correspondence between the 
respective parameters, 

(13) X b = X a 9ab or X b (4>)=X a (4>g ab ). 

This assumption is fairly strong, and it makes the task of connecting the 
spaces really simple. On the other hand, it seems to be satisfied in concrete 
cases. The same assumption will be needed in Section 12. 

In the electron spin case $ was a space of vectors, and G was the rotation 
group together with changes of scale. Then (13) holds if g a b is any rotation 
transforming a to b. 

If (13) holds for transformations on some component spaces, it also holds 
for the cartesian product of these spaces when the relevant cartesian product 
of groups is used. 

Another interesting relation is connected to Assumption 5 in the following 
way: (13) implies that one ought to have X b g b = X a g a g a i ) for some g b G G b . 
Hence it follows that X a g a bg b = X a g a g a b , so g a and g a bg b gab ac ^ m ^ ne same 
way on À a . One can give many examples of group transformations where 
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g a = g a b9 b 9ab holds i n general, giving an isomorphism between the groups 
G a and G b . 

Assumption 5 will be crucial in connecting the Hilbert spaces H a for the 
different experiments. First, from the construction of the Hilbert spaces, H a 
is a space of functions of X a ((j)), and H b is a space of functions of X b (</>). 
Furthermore, the spaces are constructed in the same way. Specifically, if 
F(<P) = f{X a ((t>)) and f b (<p) = f(X b (<P)), then by (13) we have 

(14) f\<t>) = f a {<t>9ab) = U{g ah )f a { ( t>). 
This implies: 

Theorem 2. (a) There is a connection between the spaces H a and H ò 
given by 

(15) H 6 = U(g ab )U a . 

(b) There are a Hilbert space H and for each a £ A a unitary transfor- 
mation E a such that H a = E a H. 

(c) For any experiment satisfying Assumption 4 and such that the para- 
metric Hilbert space L a is equal to H a ; there are unitary transformations 
F a such that the observational Hilbert spaces satisfy K a = F a H. 

Proof. (a) Proved above. 

(b) Obvious from (15). The space H can be chosen as any fixed H c . 

(c) From (a) and Theorem 1. □ 

Now introduce: 

Assumption 6. The group G is the smallest group containing all the 
subgroups G a . 

From this we get: 

Theorem 3. H is an invariant space for some abstract reprès entation 
W of the whole group G. 

Proof. It follows from Proposition 2 that H a is an invariant space for 
the group G a . 

This can now be extended. Observe first that 

(16) W( gi g 2 g 3 ) = E^U a ( gi )E a E b ^U b (g 2 )E b E^U c (g 3 )E c 



22 



I. S. HELLAND 



gives a representation on H of the set of elements in G that can be written 
as a product 515253 with g\ G G a , 52 £ G fe and 53 E G c . 

Continuing in this way, using Assumption 6, implying that the group G 
is generated by {G a ; a £ ^4}, we are able to construct a representation W of 
the whole group G on the space H. In particular, one is able to take H as 
an invariant space for a representation of this group. □ 

As an example, the two-dimensional Hilbert space of a particle with spin 
is always an (irreducible) invariant space for the rotation group. This de- 
termines to a large extent H, if we in addition assume H to be as small 
as possible. In general, the requirement that H should be a representation 
space for G may put a constraint on the dimension of H. 

The construction above gives a concrete representation of the quantum- 
mechanical Hilbert space. Since all Hilbert spaces of the same dimension are 
unitarily equivalent, other representations — or just an abstract represen- 
tation — may be used in practice. This is sumcient to give the Born formula 
as proved below, and through this the ordinary quantum formalism. But the 
concrete representation facilitates interpretation. 

For our construction, the unitary connection (15) between the Hilbert 
spaces for single experiments is the most important premise. This can easily 
also be related to the space-time issue. Say, let £ be the theoretical posi- 
tion, 7r the theoretical momentum, and let H 1 and H 2 be the corresponding 
L 2 -spaces of parametric functions. Then we can consider the unitary trans- 
formation from H 1 to H 2 given for some constant h by 



and in this way introduce a common Hilbert space. This can be connected to 
the relevant group, namely the group of space translations together with the 
Lorentz group, and it can be argued that h should be a universal constant. 
This will be further discussed in [36] . From physics it is known that K = 
1.055 • 1(T 34 Js. 

11. Operators and states. So, by what has just been proved, for each 
a the Hilbert space H a of unbiasedly estimable functions of A a can be put 
in unitary correspondence with a common Hilbert space H. From now on 
we shall make an assumption which is common in elementary quantum me- 
chanics, but which is very restrictive from a statistical point of view. 

Assumption 7. Each reduced (maximal) parameter À a takes only a 
finite or denumerably infinite number of vàlues A^. 

Lemma 2. These vàlues can be arranged such that each = A^ is the 
same for all a (k = 1, 2, . . . ). 
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Proof. By Assumption 5 

{0 : A b O) = X b k } = {</> : X a {<f>g ab ) = A|} = {0: A a (» = X b k }g ba . 

The sets in brackets on the left-hand side here are disjoint with union 
But then the sets on the right-hand side are disjoint with union ^g ba = 
and this implies that {A^} gives all possible vàlues of A a . □ 

In spite of Lemma 2, since in any statistical model a parameter can be 
changed to any one-to-one function of it, we may sometimes use the notation 
A^ in order to have the most general treatment. 

In the finite case Assumption 7 implies that G a , as acting upon A a , is a 
group of permutations, and that the corresponding invariant measure is the 
counting measure. 

Recali that the Hilbert space H is chosen as one fixed space H c . In this 
space let ff((j>) be defined as the trivial function which equals 1 when \ c {4>) = 
Xj, otherwise 0. These are eigenfunctions of the operator S c defined by 
S c f(4>) = A c ((/>)/(0). In a different space H a these functions correspond to 
fj(<t>) = fj(<fi9ca) = U(g ca )fÇ((p). Now define vectors in H by 

(17) vï = W(g ca )f°, 

where W is the representation defined by (16). These are eigenvectors of the 
selfadjoint operator T a = W(g ca )S c W(g ac ) with eigenvalues Xj. 

An eigenvector ifí represents the statement that the parameter A a has 
been measured with a perfect measurement that has given the value Xj. 

In general it is not true that all unit vectors of H can be given such 
an interpretation. Among other things one has to take into account what 
are called superselection rules: For an absolutely conserved quantity fj,, the 
linear combinations of eigenvectors corresponding to different eigenvalues of 
the operator associated to \i are not possible state vectors. Superselection 
rules are well known among physicists, but they are not always stressed in 
textbooks in quantum mechanics. 

In [35], Theorem 6 and Lemma 2, we proved the following under the 
assumption that the unitary group generated by {^(fl 1 )} and the phase 
factors is transitive on the component spaces H r below: 

Theorem 4. There is a decomposition of H of the form Hi © H2 © 
where each H r is an irreducible invariant space under the group G. 
Assume that the unitary group generated by {W(g)} and the phase factors 
e ta is transitive on each component H r . Then all unit vectors of each H r 
are unitarily equivalent to some f\, an indicator of an event X b = X b . On the 
other hand, if two such indicators, say f\ and fj, are unitarily equivalent to 
the same v E H r , and the relevant unitary transformation can be consídered 
as a subrepresentatíon of the regular representation, then there is a one-to- 
one function F such that X c = F(X b ) and Xj = F(X b ). 
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In simple terms a state is characterized by the fact that a (maximal) 
perfect measurement is performed, and this has led to some value of the 
corresponding maximal parameter. Concretely: A perfect experiment a£ A 
has led us to consider the Hilbert space H a , and the result A a = Xk is exactly 
characterized by the indicator function f%. Translated to the H-space, the 
state given by the information A a = Xk is then characterized by the vector 

v k- 

Corollary 1. Under the Assumptions of Theorem 4, all unit vectors 
of each irreducible space H r can be taken as state vectors with the following 
interpretation: A question a£Í (or more precisely: What is the value of 
A a ?) has been asked, and the answer is given by the realízed value X a = 
Xk, or in other words: A perfect measurement corresponding to the reduced 
parameter X a has been performed, and the result is X a = X^. 

This is consistent with the well-known quantum-mechanical interpretation 
of a state vector. In our treatment, this interpretation of a state as a question- 
answer pair is crucial. 

The operator T a may be written 

(18) T^E^tf- 

k 

These operators are self-adjoint, and they satisfy the trivial relation 

Using the results of this section to construct the joint state vector for a 
system consisting of sever al partial systems, with symmetries only within 
the partial systems, one follows the recipe v^°^ 3 = u" 1 <8> ® , where 
it is assumed that system k is in state A afc = Aj fc for k = 1,2,3. By time 
development under interaction, as described by the Schròdinger equation, 
or by other means, other, entangled, multicomponent states will occur. This 
will be further discussed in [36] and elsewhere. 

12. Born's formula. We have now obtained a statistical interpretation of 
the quantum-mechanical Hilbert space: Under the assumptions of Theorem 4 
all vectors in that space can be equivalently characterized as question-answer 
pairs and, furthermore, the Hilbert space is invariant under a suitable rep- 
resentation of the bàsic group G. 

To complete the derivation of the formalism of quantum mechanics from 
the statistical parameter approach, the most important task left is to arrive 
at the Born formula, which gives the probability of transition from one state 
to another. The fact that such a formula exists is amazing, and must be 
seen as a result of the symmetry of the situation together with the limita- 
tion imposed by the Hilbert space. Even though I use a different approach, 
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my own result is related to recent attempts to link the formula to general 
decision theory: An interesting development which goes in this direction was 
recently initiated by Deutsch [22]. The approach of Deutsch has been criti- 
cized by Finkelstein [26], by Barnum et al. [8] and by Gili [28], who gave a 
constructive set of arguments using three reasonable assumptions. 

In this section I will concentrate on the case with one irreducible compo- 
nent in the Hilbert space, that is, I will neglect superselection rules. This 
is really no limitation, since transitions between different components are 
impossible. 

What I am going to prové is a result connecting two different perfect 
experiments in the same system. Assume that we know from the first perfect 
experiment that \ a = A^. Next assume that we perform another perfect 
experiment b £ A. In both cases, the notion of perfect measurement means 
that measurement error can be neglected. More realistic experiments are 
treated in Theorems 7 and 8 below. In the perfect case it turns out that we 
can find a formula for 

P(À fe = Àj|À a = \ k ) = P(X b = \i\X a = A fc ) 

which depends only upon the state vectors and v b . 

This formula has a large number of important consequences in quantum 
mechanics and, as already said, it can be argued for in different ways. I will 
prové it from the following: 

Assumption 8. (i) The transition probabilities exist in the sense that 
the probabilities above do not depend upon anything else. 

(ii) The transition probability from A a = A& in the first perfect experi- 
ment to A a = Afc in the second perfect experiment is 1. 

(iii) For all o, ò, c we have that = X a (4>gbc) 1S a vàlid experimental 
parameter. 

(iv) For all a,b,c,í,k we have 

P(X b (4>) = Aj|A a (0) = A fc ) = P(\ b (0g bc ) = \i\\ a (<f>g bc ) = A fe ). 

Remark. (1) Assumption 8 is an important instance where the symme- 
try group setting is used in an essential way to derive a result that does not 
itself involve the symmetry group G. 

(2) Crucial assumptions will also be Assumption 3, that a common sample 
space can be used in all experiments, and Assumption 5. 

(3) We have \ b (4>gbc) = A c (0), so three experimental parameters are in- 
cluded in Assumption 8. 

(4) In the proof below we transform a single experiment by some element 
of G. The use of the transformation g on t is then justified by: 
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Lemma 3. Consider the homomorphism from the sample space trans- 
formations to the parameter space transformatíons given by 

P Xa {y 65) = P x (y G ^íT 1 ) = P A ( M e ^). 

When y = t is a complete sufficient statistíc, this is an isomorphism, so that 
one can let g be defined on the parameter space to begin with. 

Proof. Assume that there are group elements g\ and g2 of two different 
sample group transformatíons such that 

P Xg (t E B) = P x (t 9l €B) = P x (tg 2 E B). 

Then for all À and for all functions h we have 

E x (h(t gi ))=E x (h(tg 2 )). 

By the definition of a complete sufficient statistic it then follows that tg\ = 
tg 2 . □ 

Born's formula is given by: 

Theorem 5. Under the assumptions above and the assumptions of The- 
orem 4 the transition formula is as follows: 

(19) P (A fe = A i |A a = A fc ) = |< t ^| 2 . 

The proof will depend upon a recent variant [17, 18] of a well-known 
mathematical result given by Gleason [30]. One advantage of this recent 
variant is that it also is vàlid for dimension 2, when the ordinary Gleason 
theorem fails. 

The Busch-Gleason theorem. Consider any Hilbert space H. De- 
fine the set of effects as the set of operators on this Hilbert space with eigen- 
values in the interval [0,1]. Assume that there is a generalized probability 
measure ir on these effects, that is, a set function satisfying 

ir(E) > for all E, 
n(I) = 1, 

^^7r(-Ei) = tt(E) for effects Ei with sum E. 

i 

Then tt is necessarily of the form ir(E) = tr(pE) for some positive, self- 
adjoint, trace 1 operator p. 
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The effects involved in the Busch-Gleason theorem turn out to have a 
rather straightforward statistical interpretation. Look at an experiment 6, 
corresponding to a parameter \ b which can take the vàlues Aj . Let the result 
of this experiment be given by a discrete complete sumcient statistic t, thus 
allowing for an experimental error. Let t have a likelihood 



The choice of experiment 6, the set of possible parameter vàlues {Àj} and 
the result t again constitute a question-and-answer set, but now in a more 
advanced form. The point is that the answer is uncertain, so that all these 
elements together with the likelihood function must be included to specify 
the question-and-answer. 

Proposition 4. Exactly this information, the experiment b, the possible 
answers and the statistic t can be recovered from the effect defined by 



On the other hand, for fixed t every effect E can be written in the spectral 
form (20). 

Proof. This is a spectral decomposition from which the eigenvalues 
Pi(t) and the eigenvectors v\ can be recovered. As discussed before, the 
eigenvectors correspond to the question-and-answers for the case without 
measurement errors, and from the likelihood the minimal sumcient observa- 
tor t can be recovered. The last part is obvious. □ 

All this was discussed from a slightly different perspective in [35] for the 
case of a two-dimensional Hilbert space. 

Consider now the situation where a quantum system is known to be in a 
state given by v%, that is, a perfect experiment a has been performed with 
result À a = Àfc. Then make a new experiment ò, but let this experiment be 
nonperfect. We require the probability tt(E) that the result of the latter 
experiment shall be t, corresponding to the effect E given by (20). For this 
situation it is natural to define 



Pl (t) = P(t\X b = X i ). 



(20) 




(21) 




An important point in our development is that under Assumption 8, this 
7r, when ranging over all the effects E, will be a generalized probability. The 
crucial result is the following: 
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Proposition 5. Under Assumption 8, if E\, E 2 and E\ + E 2 all are 
effects, then 

7r(£i+£ 2 )=^(£i) + 7r(£ 2 ). 

Proof. Let Ei = E be given by (20), and let 

E 2 = J2 qj (t)vÇvf 
j 

for another experiment c with another likelihood qj. 

First we remark that the relations ir(rEi) = rir{Ei) and ir{E\ + £ 2 ) = 
tt(Ei) + 7r(£ 2 ) are trivial when E\, E 2 , rE\ and E\ + E 2 are all effects and 
all vf = vl 

We now turn to the general case. The statistic t may then be assumed 
to be sufhcient and complete with respect to both likelihoods. By Assump- 
tion 5 the parameters of the two experiments are connected by a group 
transformation. Then by imitating the argument in the proof of Lemma 3, 
a complete sufficient statistic for experiment b can be transformed by an 
isomorphic group transformation to a complete sufhcient statistic for exper- 
iment c; hence the complete sufhcient statistics for the two experiments may 
be assumed identical. 

Consider the experiment £3 dehned by selecting experiment £1 with prob- 
ability 1/2 and experiment E 2 with probability 1/2. Since the same mea- 
surement apparatus was used in both experiments, one can arrange things in 
such a way that the person reading t for experiment £3 does not know which 
of the experiments E\ or E 2 was chosen. This arrangement is necessary in 
order to avoid the result that the conditionality principle should disturb our 
argument for this situation; see [3] and the response to these comments. We 
can regar d £3 as a genuinely new experiment here. 

Now use Assumption 5. From this assumption there exists a group element 
gi) C such that \ c (<fi) = \ b {4>gbc)- We can, and will, rotate experiment b in such 
a way that all hnal state vectors coincide with those of experiment c. Then 
from Assumption 8, the transition probability to experiment E 2 is the same 
as if a rotated initial state was chosen and the state vectors v\ were chosen, 
but with a different likelihood g·(í) = qi(tgb c )- 

From this perspective, the experiment £3 can also be related to the same 
state vectors, but with a likelihood 

(22) r i {t) = \(p i {t) + q' i {t)). 

The statistic t will be sufhcient relative to this likelihood, but may not be 
complete or minimal. However, this is not needed for our argument. 
This gives 

(23) 7r(£ 3 ) = Í7r(£i) + Í7r(£ 2 ) 
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for experiments transformed to have the same final states. 

We can now transform back so that all three experiments have the same 
initial state. Since experiment £"3 in the rotated form had the same question- 
and-answer form as the other two experiments, only with a different likeli- 
hood (22), this experiment must also correspond to some effect. Then from 

(23) , Assumption 8 and the fact that the same sample space is used for 
all three experiments both in the original and in the rotated version, the 
transition probability must satisfy 

(24) 7T(E 3 ) = 7T(±(£l + Eh)) = Ítt(^i) + \tt{E 2 ). 

The first equality here obviously holds in the rotated case; then it also holds 
when we rotate back. If E\ + E2 is an effect, the factor 1/2 can be removed 
throughout by suitably redefining the likelihood. □ 

Proposition 6. For fixed initial state X a = Xk, the set function defined 
by (21) from the transition probability will under Assumption 8 be a gener- 
alízed probability on the final effects. 

Proof. The additivity property for a finite number of effects follows by 
induction from Proposition 5. The argument of Proposition 5 can also be 
used with a countable set of effects, so the additivity property for generalized 
effects follows for these set functions. 

It is obvious that tt(E) > 0. The limiting effect / corresponds to an ex- 
periment and experimental result with likelihood 1 on each single parameter 
value, and it is clear that the transition probability to this effect must be 1 
from every initial state. □ 

Proof of Theorem 5. Fix a and k and hence the state v%, interpreted 
as À a = Àfc. Define q a ,k( v ) = ^a,k{E) to be equal to the transition probability 
from to the effect E = vv' for an arbitrary state vector v, assumed to 
exist in Assumption 8. Generalize to any E by (21). By Proposition 6 the 
conditions of the Busch-Gleason theorem are satisfied. 

By this theorem, for any déH, we have Tr a ,k{ vvjí ) = v'pv for some p, 
which is positive, self-adjoint and has trace 1. This implies p = J2j c j u j u j 
for some orthogonal set of vectors {uj}. Self-adjointness implies that each 
Cj is real-valued, and positivity demands cj > for each j. The trace 1 
condition implies J2j c j = 1 • 

Inserting this gives 7r a ,k{ vv ^) = J2j Cj\v^Uj\ 2 . Specialize now to the particu- 
lar case given by v = v% for some k. For this case one must have J2j c j\ v< k u j\ 2 ~ 
1, and thus 
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This implies for each j that either c,- = or \v k Uj \ = 1. Since the last con- 
dition implies Uj = v% (modulus an irrelevant phase factor), and this is a 
condition which only can be true for one j, it follows that Cj = for all 
j other than the one leading to Uj = v^, and Cj = 1 for this particular j. 

Summarizing all this, we get p = v^v^ and Theorem 5 follows. □ 

A new challenge is of course to investigate to what extent this result, in 
fact all the results here from Section 11 onward, generalize to the case of 
parameters taking more than a countable set of vàlues. This will possibly 
require more advanced mathematical tools, but in that case it also seems 
quite certain that one can draw on known advanced results from quantum 
probability. 

The results above are vàlid and have relevance also outside quantum the- 
ory. In Section 12.5 of [35] a large-scale example is sketched where, using 
Born's formula, the prior probability of a second experiment is found, given 
the result of a first experiment. 

By the same proof, Born's formula can be generalized to P(E\\ a = A&) = 
v^} Ev% for an arbitrary final effect E [also Theorem 7(i) below]. This gives 
a transition probability from any state vector !)j£H. 

Recali that H was originally defined using perfect experiments. Using 
Born's formula, it can be seen that a large class of experiments take the 
same Hilbert space as a point of departure. 

13. Basic formulae of quantum mechanics and of quantum statistics. Our 

state concept may now be summarized as follows: To the state A a (-) = Afc 
there corresponds the state vector v%, and these vectors determine the tran- 
sition probabilities as in (19). The probability distribution (19) also implies 
for perfect experiments: 

Theorem 6. (a) E(A 6 |A a = A fc ) = v a ^T b v a k , where T b = J2 ■ 
(b) E(/(À b )|A = A fc ) = v^f(T b )v a k) where f(T b ) = £ f(\j)v b vf. 

Thus, in ordinary quantum-mechanical terms, the expectation of every 
observable in any state is given by the familiar formula. 

It follows from Theorem 6(a) and from the preceding discussion that the 
first three rules of Isham ([39], page 71), taken there as a basis for quantum 
mechanics, are satisfied. The fourth rule, the Schròdinger equation, will be 
discussed in [36]. 

Now turn to nonperfect experiments. In ordinary statistics, a measure- 
ment is a probability measure V d (dy) depending upon a parameter 9. As- 
sume now that such a measurement depends upon the parameter X b , while 
the current state is given by A a = Afc. Then as in Theorem 6(b): 
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Theorem 7. (a) Correspondíng to the experiment 6 £ A one can define 

an operator-valued measure M by M(dy) = J2j P Aj {dy)v b v b j . Then, given the 
inítíal state X a = X^, the probabilíty dístributíon of the result of experiment 
b is given by P[dy\X a = A&] = M(dy)v^. 

(b) These operators satisfy M[S] = I for the whole sample space S, and 
furthermore ^M(Aj) = M(A) for any finite or countable sequence of dís- 
joint elements {A\, A%, . . .} with A = \J i Ai. 

Theorem 7(b) is easily checked directly. 

A more general state assumption is a Bayesian one corresponding to this 
setting. From Theorem 7(a) we easily find: 

Theorem 8. Let the current state be given by probabilities 7r(Afc) for 

different vàlues of Then, defining p = Tr(\k) v k v k > we 9 e ^ ^[^v\ = 
tr[pM(dy)]. 

A density operator p of such a kind is often used in quantum mechan- 
ics; the definition above gives a precise interpretation. In fact, these results 
are the basis for much of quantum theory, in particular for the quantum- 
statistical inference in [7]; for a formulation, see also [39]. 

Note that the density matrix v^v^ is equivalent to the pure state v^; 
similarly, a density matrix VjVj is equivalent to the statement that a perfect 
measurement giving X b = Xj has just been performed. By straightforward 
application of Born's formula one gets: 

Theorem 9. Assume an initial state vi, and assume that a perfect mea- 
surement of X b has been performed without knowing that value. Then this 
state is described by a density matrix J2j v j\ 2v j v ^ ■ 

This is related to the celebrated and much discussed projection postulate 
of von Neumann. Writing Pj = VjVj and p = v^v^ here, the jth term in the 
last formula can be written PjpPj, which corresponds to a special case of 
the Dirac-von Neumann formula [57]. 

In general we have assumed for simplicity in this section that the state vec- 
tors are nondegenerate eigenvectors of the corresponding operators, meaning 
that the parameter A a contains all relevant information about the system. 
This can be generalized, however. 

14. The electron revisited. The electron spin is in a way the simplest 
possible quantum-mechanical system. The Hilbert space H is two-dimensional 
H can fruitfully be regarded as an irreducible representation space of the 
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rotation group. This group can be generated by the matrices cr x ,a y and a z 
given by (3). 

In the Standard quantum-mechanical formulation these three matrices are 
taken as bàsic quantities, observables corresponding to the spin in the x-, y- 
and z-directions, respectively. They have all eigenvalues ±1, corresponding 
to the vàlues of these spin observables. The corresponding eigenvectors are 
then taken as state vectors for these (perfect) measurement results. 

As a generalization, the observable T a = a x a x + a y <j y + a z a z for a real- 
valued unit vector a= (a x ,a y ,a z ) also has eigenvalues ±1, and the eigen- 
vectors have a similar state vector interpretation, corresponding to a spin 
vector in the direction a. 

The transition probabilities between states defined by spin in different 
directions are found from the Born formula, from which (5) is derived. 

A more direct representation of the spin state of an electron was dis- 
cussed in [35]. In agreement with the alternative representation of quantum 
mechanics proposed in the present paper, start with a spin vector (j> and 
choose a direction a in which the spin component shall be measured. As in 
Section 6 it is only possible to measure À a = sign(# a ) = sign(c/> • a). 

Define the 3-vector u = X a a. We claim that this vector gives a unique 
representation of the spin state of the electron. As has now been stressed 
repeatedly, we regard the state as a question-and-answer pair. The question 
(what is the spin component in direction al) is given by the chosen vector 
o; the answer is given by À a . We can recover both these elements uniquely 
from the vector u, since a spin component —1 in the direction a is equivalent 
to a spin component +1 in the direction —a. 

For those knowing some quantum mechanics, the spin state can also be 
represented by the Bloch sphere or Poincaré sphere matrix 

p=l(I + u-a), 

where a is a formal 3-vector with components given by the 2-by-2 matrices 
a x , a y and o z above. Obviously, specifying p is equivalent to specifying u. 

Finally, by conventional quantum mechanics we have p = vir , where v 
is the ordinary complex two-dimensional Hilbert space state vector, only 
defined modulo an arbitrary phase factor for an isolated system. Thus the 
spin state can be given in any of four different ways: 

(1) as a question a together with an answer A a ; 

(2) by the 3-vector u; 

(3) by the Bloch sphere matrix p; 

(4) by the Hilbert space state vector v. 

The discussion here can be generalized to other density matrices and 
further to the effects of Section 12; see [35]. 
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15. Discussion. The treatment of quantum theory given in this paper, 
is of course still not complete. In [36] two further themes will be discussed 
from the present point of view, namely the spacetime structure (including 
transformations related to Planck's constant) and the Schròdinger equation, 
which gives the time development of the state vector. 

Our point of departure here is that both quantum theory and statistical 
theory deal with prediction, both using probability models of some kind. 
In our view, what we have arrived at seems to point at a general theory 
from which both traditional statistical theory and quantum theory emerge 
as special cases. 

A bàsic premise is that the states of quantum mechanics are related to the 
parameter space of statistical models. This is an assumption that we have 
in common with other authors, for instance, Caves, Fuchs and Schack [19]. 
Hidden variable models for quantum mechanics have been criticized in many 
contexts. In my view, a hidden (total) parameter model is a more flexible 
and useful concept. A hidden parameter does not in general have a value; 
in a given situation it can be looked upon more as part of the conceptual 
framework needed to describe the situation. Only by focusing on some given 
function of the hidden total parameter can we obtain a concrete parameter 
on which inference can be made from specific experiments. 

We allow the choice between several complementary experiments/qüestions 
on the same units. Furthermore, we impose symmetry conditions of the form 
often done in statistics, but more complicated because of the choice of ex- 
periment. Finally, we allow model reduction using the orbit index of the 
experimental symmetry group. This leads to essential parts of quantum the- 
ory, and we find that the set of functions of complete sufficient statistics 
for the experiments essentially determines the Hilbert space needed for the 
quantum formulation. 

Large parts of the present theory should in principle be vàlid on a macro- 
scopic scale, too. This leads to the question of whether large-scale situations 
can be found which can be related in some way to this theory. Some brief 
examples of related applications can be mentioned. 

As an example of partly complementary parameters, look at different sets 
of orthogonal contrasts in an analysis of variance situation. In randomized 
experiments we have a symmetry group on the sample space leading to cal- 
culations [4] which in fact have some formal resemblance to those of quantum 
theory. 

With moderately complicated issues for a statistical investigation, it is 
always wise to elucidate the issue in question from several angles. This may 
involve performing experiments with different, but related parameters and 
making inference on different, but related parameters. A related case is con- 
ditioning on different ancillary statistics, where a connection to quantum 
theory was hinted at in [5]. 
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In [33] it is shown that existing chemometric prediction methods can be 
related to rotational symmetry combined with a model reduction of the kind 
discussed in this paper. 

Thus the theory developed here may seem to have something to say to cur- 
rent applied statistics. These qüestions must wait for further developments, 
however. 

John von Neumann once said: "In mathematics you don't understand 
things. You just get used to them" (cited from [11]). By now, generations 
of physicists and mathematicians have got ten used to the formal Hilbert 
space approach to quantum theory. And important results have followed 
from this, both applied and theoretical; some of the latter are mentioned 
in the Introduction. This gives overwhelming evidence that quantum theory 
is important and useful. But this in itself does not prové that the ordinary 
logical foundation for the theory is the simplest one. Our claim is the follow- 
ing: Physics is basically an empirical science, and hence one should work for, 
instead of a logical foundation suggested by formal mathematics, one that is 
related to quantitative methodology used by other empirical sciences. This 
has been some of the motivation behind the present work, and the results 
obtained seem to confirm that such a link is possible. 

APPENDIX 

A.l. Further properties of group actions. Adding a group to a statistical 
model specification is often of interest, and does have consequences; see [42]. 
First let a group G act on a measurable sample space S. Measurability 
qüestions are ignored here, as is common when discussing transformation 
groups; a full account of this aspect is given in [56]. 

The orbits of a group G acting on S are the sets of the form uog, where 
íüq is fixed and g runs through G. The orbits of the parameter group induced 
from G by (2) are defined similarly. Under conditions as given below, each 
set of orbits can be given an index. The orbit index in the sample space will 
always have a distribution which depends only upon the orbit index in the 
parameter space. 

Concentrate now on the group G acting on the total parameter space 
Similar concepts can be defined for the other group actions discussed above. 
The group G is also assumed to have a topology. 

We assume, as is commonly done, that the group operations ((71,52) l— ► 
gi<?2 and g 1— ► g~ l are continuous. Furthermore, we will assume that the 
action (g,4>) 1— > 4>g is continuous for çí> G An additional condition, dis- 
cussed in [61], is that every inverse image of compact sets under the function 
(g, 4>) ► {<j>g, 4>) should be compact. A continuous action by a group G on a 
space $ satisfying this condition is called proper. This technical condition 
turns out to have useful properties and is assumed throughout this paper. 
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When the group action is proper, the orbits of the group can be proved to 
be closed sets relative to the topology of <F 

For fixed ep S <ï>, a stabílíty subgroup H of G is defined as {h:cph = ep}. 
These are transformed within orbits of G as H g~ 1 Hg. 

Every locally compact group possesses a right-invariant Haar measure 
v satisfying v(Dg) = v{D) for D C G [46]. This induces a right-invariant 
measure on itself if each stability group H is compact, which is the case 
if the action G on $ is proper and the group is locally compact. The last 
assertion is proved in ([61], Theorem 2.3.13(c)). A right-invariant measure 
v on $ satisfies by definition v(Fg) = v(F) for all (measurable) F C $ and 
g^G. 

A.2. On group representation theory. A matrix representation of a group 
G is defined as a function U from the group to the set of (here complex) 
matrices satisfying U(gh) = U(g)U(h) for all g,h £ G. In other words, a 
representation is a homomorphism from G to the multiplicative group of 
square matrices of a fixed dimension. Any representation U and any fixed 
nonsingular matrix K of the same size can be used to construct another rep- 
resentation S(g) = KU{g)K~ l . If the group is compact (and also in some 
other cases), we can always find such S of minimal block diagonal form, and 
at the same time we can take S to be unitary [S(g)^ S(g) = I}. If (and only 
if ) the group is Abelian, each minimal block will be one-dimensional. 

An important aspect of this reduction appears if we look upon the matri- 
ces as operators on a vector space: Then each collection of blocks gives an 
invariant vector space under the multiplicative group of matrices, and each 
single minimal block gives an irreducible invariant vector space. For compact 
groups, the irreducible invariant vector spaces will be finite-dimensional. The 
minimal matrices in the blocks are called irreducible representations of the 
group. 

More generally, a class of operators {U(g);g 6 G} (where G is a group) on 
a, possibly infinite-dimensional, vector space is a representation if U(gh) = 
U(g)U(h) for all g,h. A representation of a compact group always has a 
complete reduction in minimal matrix representations as described above. 
In particular, this holds for the unitary regular representation defined on a 
Hilbert space L 2 ($,i/) by Uü(g)f(<p) = f(<fig). Here v is the right-invariant 
measure for Gon $. 

A useful result is Schur's lemma: 

If U and U' are irreducible representations, and A is a bounded linear map 
such that U{g)A = AU'(g) for all g, then either U and U 1 are isomorphic or 
A = 0. If U(g)A = AU{g) for all g, then necessarily A = XI for some sealar 
A. 

More on group representations can be found in [9, 23, 31, 40, 47, 55, 62]. 
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